Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
1.
Genes Genet Syst ; 98(5): 221-237, 2023 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-37839865

RESUMO

Since the early phase of the coronavirus disease 2019 (COVID-19) pandemic, a number of research institutes have been sequencing and sharing high-quality severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes to trace the route of infection in Japan. To provide insight into the spread of COVID-19, we developed a web platform named SARS-CoV-2 HaploGraph to visualize the emergence timing and geographical transmission of SARS-CoV-2 haplotypes. Using data from the GISAID EpiCoV database as of June 4, 2022, we created a haplotype naming system by determining the ancestral haplotype for each epidemic wave and showed prefecture- or region-specific haplotypes in each of four waves in Japan. The SARS-CoV-2 HaploGraph allows for interactive tracking of virus evolution and of geographical prevalence of haplotypes, and aids in developing effective public health control strategies during the global pandemic. The code and the data used for this study are publicly available at: https://github.com/ktym/covid19/.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/epidemiologia , COVID-19/genética , Haplótipos , Japão/epidemiologia , Pandemias , Genoma Viral
2.
Hum Genome Var ; 9(1): 44, 2022 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-36509753

RESUMO

TogoVar ( https://togovar.org ) is a database that integrates allele frequencies derived from Japanese populations and provides annotations for variant interpretation. First, a scheme to reanalyze individual-level genome sequence data deposited in the Japanese Genotype-phenotype Archive (JGA), a controlled-access database, was established to make allele frequencies publicly available. As more Japanese individual-level genome sequence data are deposited in JGA, the sample size employed in TogoVar is expected to increase, contributing to genetic study as reference data for Japanese populations. Second, public datasets of Japanese and non-Japanese populations were integrated into TogoVar to easily compare allele frequencies in Japanese and other populations. Each variant detected in Japanese populations was assigned a TogoVar ID as a permanent identifier. Third, these variants were annotated with molecular consequence, pathogenicity, and literature information for interpreting and prioritizing variants. Here, we introduce the newly developed TogoVar database that compares allele frequencies among Japanese and non-Japanese populations and describes the integrated annotations.

3.
Bioinformatics ; 38(17): 4194-4199, 2022 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-35801937

RESUMO

MOTIVATION: Understanding life cannot be accomplished without making full use of biological data, which are scattered across databases of diverse categories in life sciences. To connect such data seamlessly, identifier (ID) conversion plays a key role. However, existing ID conversion services have disadvantages, such as covering only a limited range of biological categories of databases, not keeping up with the updates of the original databases and outputs being hard to interpret in the context of biological relations, especially when converting IDs in multiple steps. RESULTS: TogoID is an ID conversion service implementing unique features with an intuitive web interface and an application programming interface (API) for programmatic access. TogoID currently supports 65 datasets covering various biological categories. TogoID users can perform exploratory multistep conversions to find a path among IDs. To guide the interpretation of biological meanings in the conversions, we crafted an ontology that defines the semantics of the dataset relations. AVAILABILITY AND IMPLEMENTATION: The TogoID service is freely available on the TogoID website (https://togoid.dbcls.jp/) and the API is also provided to allow programmatic access. To encourage developers to add new dataset pairs, the system stores the configurations of pairs at the GitHub repository (https://github.com/togoid/togoid-config) and accepts the request of additional pairs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Gerenciamento de Dados , Software , Bases de Dados Factuais
5.
F1000Res ; 9: 136, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32308977

RESUMO

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.


Assuntos
Disciplinas das Ciências Biológicas , Biologia Computacional , Web Semântica , Mineração de Dados , Metadados , Reprodutibilidade dos Testes
6.
Methods Mol Biol ; 1910: 747-766, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31278684

RESUMO

Open-source software encourages computer programmers to reuse software components written by others. In evolutionary bioinformatics, open-source software comes in a broad range of programming languages, including C/C++, Perl, Python, Ruby, Java, and R. To avoid writing the same functionality multiple times for different languages, it is possible to share components by bridging computer languages and Bio* projects, such as BioPerl, Biopython, BioRuby, BioJava, and R/Bioconductor.In this chapter, we compare the three principal approaches for sharing software between different programming languages: by remote procedure call (RPC), by sharing a local "call stack," and by calling program to programs. RPC provides a language-independent protocol over a network interface; examples are SOAP and Rserve. The local call stack provides a between-language mapping, not over the network interface but directly in computer memory; examples are R bindings, RPy, and languages sharing the Java virtual machine stack. This functionality provides strategies for sharing of software between Bio* projects, which can be exploited more often.Here, we present cross-language examples for sequence translation and measure throughput of the different options. We compare calling into R through native R, RSOAP, Rserve, and RPy interfaces, with the performance of native BioPerl, Biopython, BioJava, and BioRuby implementations and with call stack bindings to BioJava and the European Molecular Biology Open Software Suite (EMBOSS).In general, call stack approaches outperform native Bio* implementations, and these, in turn, outperform "RPC"-based approaches. To test and compare strategies, we provide a downloadable Docker container with all examples, tools, and libraries included.


Assuntos
Biologia Computacional , Linguagens de Programação , Software , Biologia Computacional/métodos , Interface Usuário-Computador , Navegador
7.
Database (Oxford) ; 20192019 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30624651

RESUMO

TogoGenome is a genome database that is purely based on the Semantic Web technology, which enables the integration of heterogeneous data and flexible semantic searches. All the information is stored as Resource Description Framework (RDF) data, and the reporting web pages are generated on the fly using SPARQL Protocol and RDF Query Language (SPARQL) queries. TogoGenome provides a semantic-faceted search system by gene functional annotation, taxonomy, phenotypes and environment based on the relevant ontologies. TogoGenome also serves as an interface to conduct semantic comparative genomics by which a user can observe pan-organism or organism-specific genes based on the functional aspect of gene annotations and the combinations of organisms from different taxa. The TogoGenome database exhibits a modularized structure, and each module in the report pages is separately served as TogoStanza, which is a generic framework for rendering an information block as IFRAME/Web Components, which can, unlike several other monolithic databases, also be reused to construct other databases. TogoGenome and TogoStanza have been under development since 2012 and are freely available along with their source codes on the GitHub repositories at https://github.com/togogenome/ and https://github.com/togostanza/, respectively, under the MIT license.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Web Semântica , Software , Humanos
8.
Database (Oxford) ; 20182018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-30576482

RESUMO

In the life sciences, researchers increasingly want to access multiple databases in an integrated way. However, different databases currently use different formats and vocabularies, hindering the proper integration of heterogeneous life science data. Adopting the Resource Description Framework (RDF) has the potential to address such issues by improving database interoperability, leading to advances in automatic data processing. Based on this idea, we have advised many Japanese database development groups to expose their databases in RDF. To further promote such activities, we have developed an RDF-based life science dataset repository called the National Bioscience Database Center (NBDC) RDF portal. All the datasets in this repository have been reviewed by the NBDC to ensure interoperability and queryability. As of July 2018, the service includes 21 RDF datasets, comprising over 45.5 billion triples. It provides SPARQL endpoints for all datasets, useful metadata and the ability to download RDF files. The NBDC RDF portal can be accessed at https://integbio.jp/rdf/.


Assuntos
Disciplinas das Ciências Biológicas , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Semântica , Internet , Interface Usuário-Computador
10.
Nucleic Acids Res ; 45(D1): D25-D31, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27924010

RESUMO

The DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp) has been providing public data services for thirty years (since 1987). We are collecting nucleotide sequence data from researchers as a member of the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), in collaboration with the US National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute (EBI). The DDBJ Center also services Japanese Genotype-phenotype Archive (JGA), with the National Bioscience Database Center to collect human-subjected data from Japanese researchers. Here, we report our database activities for INSDC and JGA over the past year, and introduce retrieval and analytical services running on our supercomputer system and their recent modifications. Furthermore, with the Database Center for Life Science, the DDBJ Center improves semantic web technologies to integrate and to share biological data, for providing the RDF version of the sequence data.


Assuntos
Bases de Dados de Ácidos Nucleicos , Análise de Sequência de DNA , Animais , Genótipo , Humanos , Internet , Japão , Anotação de Sequência Molecular , Fenótipo , Software
11.
PeerJ ; 4: e2331, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27602295

RESUMO

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

12.
Nat Commun ; 7: 12808, 2016 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-27649274

RESUMO

Tardigrades, also known as water bears, are small aquatic animals. Some tardigrade species tolerate almost complete dehydration and exhibit extraordinary tolerance to various physical extremes in the dehydrated state. Here we determine a high-quality genome sequence of Ramazzottius varieornatus, one of the most stress-tolerant tardigrade species. Precise gene repertoire analyses reveal the presence of a small proportion (1.2% or less) of putative foreign genes, loss of gene pathways that promote stress damage, expansion of gene families related to ameliorating damage, and evolution and high expression of novel tardigrade-unique proteins. Minor changes in the gene expression profiles during dehydration and rehydration suggest constitutive expression of tolerance-related genes. Using human cultured cells, we demonstrate that a tardigrade-unique DNA-associating protein suppresses X-ray-induced DNA damage by ∼40% and improves radiotolerance. These findings indicate the relevance of tardigrade-unique proteins to tolerability and tardigrades could be a bountiful source of new protection genes and mechanisms.


Assuntos
Adaptação Fisiológica/genética , Genoma , Tardígrados/genética , Animais , Dano ao DNA , Transferência Genética Horizontal , Células HEK293 , Humanos , Peroxissomos , Estresse Fisiológico/genética , Raios X
13.
J Biomed Semantics ; 7: 39, 2016 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-27296299

RESUMO

BACKGROUND: Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. DESCRIPTION: We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. CONCLUSIONS: Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.


Assuntos
Ontologias Biológicas , Anotação de Sequência Molecular/normas , Nucleotídeos/genética , Nucleotídeos/metabolismo , Proteínas/química , Proteínas/metabolismo , Semântica , Bases de Dados Genéticas , Bases de Dados de Proteínas , Lógica Fuzzy , Humanos , Obras de Referência
14.
Nucleic Acids Res ; 44(D1): D51-7, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26578571

RESUMO

The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. The contents of the DDBJ databases are shared with the US National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI) within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). Since 2013, the DDBJ Center has been operating the Japanese Genotype-phenotype Archive (JGA) in collaboration with the National Bioscience Database Center (NBDC) in Japan. In addition, the DDBJ Center develops semantic web technologies for data integration and sharing in collaboration with the Database Center for Life Science (DBCLS) in Japan. This paper briefly reports on the activities of the DDBJ Center over the past year including submissions to databases and improvements in our services for data retrieval, analysis, and integration.


Assuntos
Bases de Dados de Ácidos Nucleicos , Análise de Sequência de DNA , Ontologias Biológicas , Computadores , Genótipo , Fenótipo
15.
J Biomed Semantics ; 6: 3, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25973165

RESUMO

BACKGROUND: Linked Data has gained some attention recently in the life sciences as an effective way to provide and share data. As a part of the Semantic Web, data are linked so that a person or machine can explore the web of data. Resource Description Framework (RDF) is the standard means of implementing Linked Data. In the process of generating RDF data, not only are data simply linked to one another, the links themselves are characterized by ontologies, thereby allowing the types of links to be distinguished. Although there is a high labor cost to define an ontology for data providers, the merit lies in the higher level of interoperability with data analysis and visualization software. This increase in interoperability facilitates the multi-faceted retrieval of data, and the appropriate data can be quickly extracted and visualized. Such retrieval is usually performed using the SPARQL (SPARQL Protocol and RDF Query Language) query language, which is used to query RDF data stores. For the database provider, such interoperability will surely lead to an increase in the number of users. RESULTS: This manuscript describes the experiences and discussions shared among participants of the week-long BioHackathon 2011 who went through the development of RDF representations of their own data and developed specific RDF and SPARQL use cases. Advice regarding considerations to take when developing RDF representations of their data are provided for bioinformaticians considering making data available and interoperable. CONCLUSIONS: Participants of the BioHackathon 2011 were able to produce RDF representations of their data and gain a better understanding of the requirements for producing such data in a period of just five days. We summarize the work accomplished with the hope that it will be useful for researchers involved in developing laboratory databases or data analysis, and those who are considering such technologies as RDF and Linked Data.

16.
Bioinformatics ; 31(11): 1875-7, 2015 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-25638809

RESUMO

MOTIVATION: On the semantic web, in life sciences in particular, data is often distributed via multiple resources. Each of these sources is likely to use their own International Resource Identifier for conceptually the same resource or database record. The lack of correspondence between identifiers introduces a barrier when executing federated SPARQL queries across life science data. RESULTS: We introduce a novel SPARQL-based service to enable on-the-fly integration of life science data. This service uses the identifier patterns defined in the Identifiers.org Registry to generate a plurality of identifier variants, which can then be used to match source identifiers with target identifiers. We demonstrate the utility of this identifier integration approach by answering queries across major producers of life science Linked Data. AVAILABILITY AND IMPLEMENTATION: The SPARQL-based identifier conversion service is available without restriction at http://identifiers.org/services/sparql.


Assuntos
Bases de Dados Factuais , Disciplinas das Ciências Biológicas , Internet , Semântica , Integração de Sistemas
17.
Genome Biol ; 16: 22, 2015 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-25723102

RESUMO

The FANTOM5 project investigates transcription initiation activities in more than 1,000 human and mouse primary cells, cell lines and tissues using CAGE. Based on manual curation of sample information and development of an ontology for sample classification, we assemble the resulting data into a centralized data resource (http://fantom.gsc.riken.jp/5/). This resource contains web-based tools and data-access points for the research community to search and extract data related to samples, genes, promoter activities, transcription factors and enhancers across the FANTOM5 atlas.


Assuntos
Genômica/métodos , Regiões Promotoras Genéticas , Software , Iniciação da Transcrição Genética , Animais , Biologia Computacional/métodos , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica , Humanos , Camundongos , Transcriptoma , Interface Usuário-Computador
18.
PLoS One ; 10(2): e0118272, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25675104

RESUMO

Tardigrades are able to tolerate almost complete dehydration through transition to a metabolically inactive state, called "anhydrobiosis". Late Embryogenesis Abundant (LEA) proteins are heat-soluble proteins involved in the desiccation tolerance of many anhydrobiotic organisms. Tardigrades, Ramazzottius varieornatus, however, express predominantly tardigrade-unique heat-soluble proteins: CAHS (Cytoplasmic Abundant Heat Soluble) and SAHS (Secretory Abundant Heat Soluble) proteins, which are secreted or localized in most intracellular compartments, except the mitochondria. Although mitochondrial integrity is crucial to ensure cellular survival, protective molecules for mitochondria have remained elusive. Here, we identified two novel mitochondrial heat-soluble proteins, RvLEAM and MAHS (Mitochondrial Abundant Heat Soluble), as potent mitochondrial protectants from Ramazzottius varieornatus. RvLEAM is a group3 LEA protein and immunohistochemistry confirmed its mitochondrial localization in tardigrade cells. MAHS-green fluorescent protein fusion protein localized in human mitochondria and was heat-soluble in vitro, though no sequence similarity with other known proteins was found, and one region was conserved among tardigrades. Furthermore, we demonstrated that RvLEAM protein as well as MAHS protein improved the hyperosmotic tolerance of human cells. The findings of the present study revealed that tardigrade mitochondria contain at least two types of heat-soluble proteins that might have protective roles in water-deficient environments.


Assuntos
Proteínas Mitocondriais/metabolismo , Osmorregulação , Pressão Osmótica , Tardígrados/fisiologia , Motivos de Aminoácidos , Sequência de Aminoácidos , Animais , Temperatura Alta , Humanos , Mitocôndrias/metabolismo , Proteínas Mitocondriais/química , Proteínas Mitocondriais/genética , Dados de Sequência Molecular , Osmorregulação/genética , Transporte Proteico , Solubilidade
19.
Nucleic Acids Res ; 43(Database issue): D18-22, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25477381

RESUMO

The DNA Data Bank of Japan Center (DDBJ Center; http://www.ddbj.nig.ac.jp) maintains and provides public archival, retrieval and analytical services for biological information. Since October 2013, DDBJ Center has operated the Japanese Genotype-phenotype Archive (JGA) in collaboration with our partner institute, the National Bioscience Database Center (NBDC) of the Japan Science and Technology Agency. DDBJ Center provides the JGA database system which securely stores genotype and phenotype data collected from individuals whose consent agreements authorize data release only for specific research use. NBDC has established guidelines and policies for sharing human-derived data and reviews data submission and usage requests from researchers. In addition to the JGA project, DDBJ Center develops Semantic Web technologies for data integration and sharing in collaboration with the Database Center for Life Science. This paper describes the overview of the JGA project, updates to the DDBJ databases, and services for data retrieval, analysis and integration.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genótipo , Fenótipo , Estudos de Associação Genética , Humanos , Internet , Análise de Sequência de DNA
20.
BMC Bioinformatics ; 15 Suppl 14: S7, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25472764

RESUMO

BACKGROUND: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. RESULTS: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled "unconferences" (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. CONCLUSIONS: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects.


Assuntos
Biologia Computacional , Comportamento Cooperativo , Software , Comunicação , Internet
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA